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Abstract — Marton's inner bound is the best known achievable 
region for a general discrete memoryless broadcast channel. To 
compute Marton's inner bound one has to solve an optimization 
problem over a set of joint distributions on the input and 
auxiliary random variables. The optimizers turn out to be 
structured in many cases. Finding properties of optimizers not 
only results in efficient evaluation of the region, but it may also 
help one to prove factorization of Marton's inner bound (and 
thus its optimality). The first part of this paper formulates this 
factorization approach explicitly and states some conjectures and 
results along this line. The second part of this paper focuses 
primarily on the structure of the optimizers. This section is 
inspired by a new binary inequality that recently resulted in 
a very simple characterization of the sum-rate of Marton's inner 
bound for binary input broadcast channels. This prompted us 
to investigate whether this inequality can be extended to larger 
cardinality input alphabets. We show that several of the results 
for the binary input case do carry over for higher cardinality 
alphabets and we present a collection of results that help restrict 
the search space of probability distributions to evaluate the 
boundary of Marton's inner bound in the general case. We also 
prove a new inequality for the binary skew-symmetric broadcast 
channel that yields a very simple characterization of the entire 
Marton inner bound for this channel. 

I. Introduction 

A broadcast channel |1| models a communication scenario 
where a single sender wishes to communicate multiple mes- 
sages to many receivers. A two receiver discrete memoryless 
broadcast channel consists of a sender X and two receivers 
Y, Z. The sender maps a pair of messages Mi , M 2 to a trans- 
mit sequence X n {m\,m2){& X n ) and the receivers each get 
a noisy version Y n (G y n ),Z n (e Z n ) respectively. Further 
|*|,|y|,|2| < oo and p(y?,z n \x n ) = E^iK^N)- For 
more details on this model and a collection of known results 
please refer to Chapters 5 and 8 in 0. We also adopt most 
of our notation from this book. 

The best known achievable rate region for a broadcast 
channel is the following inner bound due to 0. Here we 
consider the private messages case. 

Bound 1. (Marton) The union of rate pairs R\,R2 satisfying 
the constraints 

Rx < I(U,W;Y), 
R 2 <I(V,W;Z), 
Ri+R 2 < mm{I(W; Y), I(W; Z)} + I(U; Y\W) 
+ I(V;Z\W)-I(U;V\W), 



for any triple of random variables (U, V, W) such that 
(U, V, W) — > X — > (Y, Z) is achievable. Further to compute 
this region it suffices ^ to consider |W| < \X\ + 4, \U\ < 
miV|<|AT|. 

It is not known whether this region is the true capacity 
region since the traditional Gallager-type technique for proving 
converses fails to work in this case. This raises the question of 
whether Marton's inner bound has an alternative representation 
that is better amenable to analysis. We believe that central to 
answering this question is understanding properties of joint 
distributions p(u,v,w,x) corresponding to extreme points 
of Marton's inner bound. Our approach to this is twofold. 
Roughly speaking in the first part of this paper we find suf- 
ficient conditions on the optimizing distributions p(u,v,w,x) 
which would imply a kind of factorization of Marton's inner 
bound. Such a factorization would imply that Marton's region 
is the correct rate region. In the second part we find necessary 
conditions on any optimizing p(u,v,w,x). Unfortunately the 
gap between these sufficient and necessary conditions is still 
wide. However we discuss how the necessary conditions may 
enhance our understanding of the maximizers of the expression 
I(U; Y) + I(V; Z) - I(U; V) and how it may be useful in 
proving the factorization of Marton's inner bound. 

A. Necessary conditions 

The question of whether Marton's inner bound matches one 
of the known outer bounds has been studied in several works 
recently 0, 0, 0, Q, 0. Since we build upon these 
results in this work, a brief literature review is in order. It 
was shown in [6] that a gap exists between Marton's inner 
bound and the best-known outer bound [9| for the binary 
skew-symmetric (BSSC) broadcast channel (Fig.[T]l if a certain 
binary inequality, ([T} below, holds. A gap between the bounds 
was demonstrated for the BSSC in without explicitly 
having to evaluate the inner bound. The conjectured inequality 
for this channel was established in and hence Marton's 
sum-rate for BSSC was explicitly evaluated. The inequality 
was shown 1 8 1 to hold for all binary input broadcast channels 
thus giving an alternate representation to Marton's sum-rate 
for binary input broadcast channels. 

Theorem 1. [8] For all random variables (U, V, X, Y, Z) such 
that (U, V) — S- X — >• (Y, Z) and \X\—2 the following holds 

I(U; Y) + I(V; Z) - I(U; V) < max{/(X; Y),I(X; Z)}. (1) 

This yields the following immediate corollary. 



Corollary 1. [8] The maximum sum-rate achievable by Mar- 
ton's inner bound for any binary input broadcast channel is 
given by 

max min{/(VK; Y), I(W', Z)} + P(W = Q)I(X; Y\W = 0) 

+ P{W = l)I(X;Z\W = 1). 
Here W = {0, 1}. 

Note that this characterization is much simpler than the one 
given in Bound [T] 

Our results on the necessary conditions of an optimizer 
attempt to extend the new binary inequality to larger alphabets 
and to the entire rate region (rather than just the sum rate). 

B. Sufficient conditions 

Suppose we have certain properties of p(u,v,w, x) that 
maximize Marton's inner bound. How can one use this to 
prove that Marton's inner bound is tight? The traditional 
Gallager-type technique requires us to consider the rt-letter 
expression and to try to identify single-letter auxiliary random 
variables. If any such statement can be shown, it has to hold 
for n = 2 in particular. In iflOl . the authors studied Marton's 
inner bound (sum-rate) via a two-letter approach and there 
they presented an approach to test whether Marton's inner 
bound is indeed optimal. The crux of the paper [ 1 1 is a 
certain factorization idea which if established would yield the 
optimality of Marton's inner bound for discrete memoryless 
broadcast channels. Further the authors used the same idea 
to show ifTTl an example of a class of broadcast channels 
where Marton's inner bound is tight and the best known 
outer bounds are strictly loos^j] The converse to the capacity 
region of this class of broadcast channels was motivated by 
the factorization approach. The authors also showed that the 
factorizing approach works if an optimizer p(u, v, w, X1X2) for 
the two-letter Marton's inner bound satisfies certain conditions. 

In this paper we provide more sufficient conditions that 
imply factorization by forming a more refined version of the 
two-letter approach [11 1. Simulations conducted on randomly 
generated binary input broadcast channels indicate that per- 
haps the factorization stated below (Conjecture [T} is true; thus 
indicating that Marton's inner bound could be optimal. 

For any broadcast channel q(y,z\x), define 

T(X) := max I(U ; Y) + I(V; Z) — I(U; V). 

p(u,v\x) 

Note that T(X) is a function of p(x) for a given broadcast 
channel. Similarly for any function f(X), defined on p(x) 
denote by 

€[f(X)} := max Y>(«)/(X|7 = «), 

p(v\x) 

the upper concave envelope evaluated of f(X) at p(x). (Note 
that one can restrict the maximization to |V| < \X | by Fenchel- 
Caratheodory arguments). A 2-letter broadcast channel is a 

1 The previous works established a gap between the bounds and in this work 
it was shown that the outer bounds (both in the presence and absence of a 
common message) are strictly sub-optimal. 



product broadcast channel whose transition probability is given 
by (\{yi, Zx\x\)(\{y 2 , z 2 \x 2 ); i.e. they can be considered as 
parallel non-interfering broadcast channels. For this channel 
the function T(Xi,X 2 ) is defined similarly as 

max I(U; Y U Y 2 ) + I(V; Z U Z 2 ) - I(U; V). 

Conjecture 1. For all product channels, for all A € [0, 1] and 

for all p(xi,x 2 ) the following holds: 

- XH{Y 1 ,Y 2 ) - XH{Z U Z 2 ) + T(X U X 2 ) 
< £[-A.ff(Yi) - Aff(Zi) + T(.Xi)] 

+ €[-XH(Y 2 ) - XH(Z 2 ) + T(X 2 )], 

where X = 1 — A. 

Remark 1. The above conjecture was not formally stated in 
ifTUll as the authors did not have enough numerical evidence 
at that point; however subsequently the evidence has grown 
enough for some of the authors to have reasonable confidence 
in the validity of the above statement. 

It was shown ifTol that if Conjecture [T] holds then Marton's 
inner bound would yield the optimal sum-rate for a two- 
receiver discrete memoryless broadcast channel. Hence estab- 
lishing the veracity of the conjecture becomes an important 
direction in studying the optimality of Marton's inner bound. 

The validity of Conjecture [T] was established [10] in the 
following three instances: 

1) A = 0, A = 1, i.e. the extreme points of the interval, 

2) If one of the four channels, say X\ n- Y\ is deterministic, 

3) In one of the components, say the first, receiver Y\ is 
more capabl^] than receiver Z\ . 

Note that to establish the conjecture one needs to get a 
better handle on T(X). What inequality ([TJ shows is that when 
\X\ = 2 then 

T(X) = nmx{I(X; Y),I(X; Z)}, 

In this work, we seek generalizations of the inequality ([T} in 
two different directions: 

• To the entire private messages region: Maximizing 
I(U; Y) + I(V; Z) - I(U; V) for a given p(x) is related 
to the sum-rate computation of Marton's inner bound. If 
one is interested in the entire private messages region, 
one must deal with a slightly more general form and this 



is presented in Section I-Bl 



• Beyond binary input alphabets: The inequality ([T| itself 
fails to hold where \X\ = 3, for instance in the Blackwell 
channeQ Therefore, we attempt to establish properties of 
the optimizing distributions p(u, v\x) that achieve T(X), 
in Section HTxl 

2 A receiver Y is said to be more-capable [12] than receiver Z if I(X; Y) > 
I(X;Z)Vp(x). 

3 Blackwell channel is a deterministic broadcast channel with X = 
{0,1,2}, with the mapping X n- y X Z given by: M> (0,0), 1 ^ 
(0,1),2H(1,1). 



1) A generalized conjecture: Much of the work in ifTUl 
focused on the sum-rate. If one is interested in proving 
the optimality of the entire rate-region (for the private mes- 
sage case) then establishing the following equivalent con- 
jecture would be sufficient. For a > 1 define T a (X) := 
max^i,) aI(U; Y) + I(V; Z) - I(U; V). 

Conjecture 2. For all product channels, for all X G [0, 1], for 

all a > 1, and for all p(x Xl x 2 ) the following holds: 

- (a - X)H(Y X ,Y 2 ) ~ XH(Z X , Z 2 ) + T a (X x ,X 2 ) 
< €[-(a - X)H(Y X ) - XH(Z X ) + T a {X x )\ 
+ C[-(q - X)H(Y 2 ) - XH(Z 2 ) + T a (X 2 )}. 

Remark 2. The sufficiency of the conjecture in proving the 
optimality of Marton's inner bound follows from a 2-letter 
argument similar to that found in 1 10 1. However this conjecture 
is not equivalent to proving the optimality of Marton's inner 
bound; indeed it is a stronger statement. 

II. Sufficient conditions 

A sufficient condition beyond those established in [ 10 1 that 
imply factorization is the following: 

Claim 1. For some p(x x ,x 2 ) and a product channel if we 
have a p(u,v\x\,x 2 ) such that 

T(X X ,X 2 ) = I{U; Y x , Y 2 ) + I(V; Z X ,Z 2 ) - I(U; V), 

and further P(^2 = x 2 \U = u) € {0,1} yu,x 2 , then the 
factorization conjecture holds. 

Proof: Observe that (by elementary manipulations) we 

have 

- (a ~ X)H(Y 1 ,Y 2 ) - XH{Z U Z 2 ) + aI(U; Y lt Y 2 ) 
+ I(V:Z 1 ,Z 2 )-I(U;V) 

= -(a - \)H{Xi\Zi) - XH{Z X \Z 2 ) + aI(U; Y X \Z 2 ) 
+ I(V; Z X \Z 2 ) - I(U; V\Z 2 ) - (a - X)H{Y 2 \Y l ) 
- XH{Z 2 \Y X ) + aI(U; Y 2 \Y X ) + I(V; Z 2 \U, Y x ) 
-{a- l)I(Y i; Z 2 \U) - I(Y i; Z 2 \U, V). 

Since X 2 is a function of U we have 

aI(U; Y 2 \Yx) + I(V; Z 2 \U,Y x )-{a- l)I(Y x ; Z 2 \U) 
-I(Y x ;Z 2 \U,V) = aI{X 2 ;Y 2 \Y x ). 

Hence 

T(X X ,X 2 ) = -(a - X)H(Y X \Z 2 ) - XH(Z X \Z 2 ) 
+ aI(U; Y X \Z 2 ) + I(V; Z X \Z 2 ) - I(U; V\Z 2 ) 
-(a-X)H(Y 2 \Y x ) 
-XH(Z 2 \Y x ) + aI(X 2 ;Y 2 \Y x ) 
< C[-(a - X)H(Y X ) - XH(Z X ) + T a (X x )} 
+ £[-(a - X)H(Y 2 ) - XH(Z 2 ) + T a (X 2 )}. 

m 

Remark 3. The main purpose of this claim is to demonstrate 
that if the distributions p(u,v\x) that achieve T(X), we 




Fig. 1. The binary skew-symmetric broadcast channel 



will refer to them as extremal distributions, satisfy certain 
properties, then we could employ these properties to establish 
the conjecture. In this paper we will establish some such 
properties of the extremal distributions. 

A. A conjecture for binary alphabets 

A natural guess for extending the inequality ([TJ, so as to 
compute T a (X), is the following: for any a > 1, for all 
random variables (U, V, X, Y, Z) such that (U, V) -> X ->• 
(Y, Z) and \X\ = 2, the following holds 

aI(U ; Y) + I(V; Z) — I(U ;V) < 

m&x{aI(X;Y),I(X;Z)}. (2) 

However this inequality turns out to be false in general. A 
counterexample is presented in Appendix B. 

However the inequality is true in the following cases: 

1) If a < 1 then the inequality in |2| holds: To see this let 
Y' be obtained from Y by erasing each received symbol 
with probability 1 — a. It is straightforward to see that 

I(U; Y') = aI{U; Y) and I(X; Y') = aI(X; Y). Since 

I(U; Y')+I(V; Z)-I(U; V) < max{/(X; Y'),I(X; Z)}, 

the inequality holds. 

2) If a > 1 at any p(x) where I(X; Y) > I(X; Z) the 
inequality holds since 

(a-l)I(U;Y) < {a-l)I(X;Y), 
I(U; Y) + I(V; Z) - I(U; V) < I(X; Y). 

The inequality in Equation <|2j also holds for the binary 
skew-symmetric broadcast channel shown in Figure [T] (we 
assume p = |); quite possibly the simplest channel whose 
capacity region is not established. The proof is presented in 
Appendix A. 

By establishing Equation |2]) for this channel, we are now 
able to precisely characterize Marton's inner bound region for 
this channel. In particular it is straightforward to see that for 
a > 1, if A4 represents Marton's inner bound, then 



max aRi + i? 2 

(Ri,R2)eM 



( mm{I(W; Y),I(W; Z)} + {a - 1)I(W; Y) 

+ aP(W = 0)I(X;Y\W = 0) 
+ P{W = l)I{X;Z\W = 1) 



max 



A similar statement holds for when the roles of Y, Z are 
interchanged. 

Based on simulations and other evidence we propose the 
following conjecture. 

Conjecture 3. For all a > 1, for all (U, V) -t X -> (Y, Z) 
with \X\ =2, we have 

-{a- X)H(Y) - \H{Z) + T a (X) 

< <t[-(a - X)H(Y) - XH(Z) + max{aI(X; Y),I(X; Z)}]. 

Remark 4. Clearly for a broadcast channel if equation |2]) 
holds then the conjecture holds. Even though we know that 
Equation |2} may fail at some p(x) for some channels, the 
conjecture states that Equation Q holds for a sufficient class 
of p(x) that is needed to compute the concave envelope. 

III. Necessary conditions: beyond binary input 

ALPHABETS 

In this section we compute some properties of the extremal 
distributions for T(X), \X\ > 3. To understand our approach, 
it is useful to have a quick recap of the proof of Equation 
([TJ for binary alphabets. The main idea behind the proof is 
to isolate the local maxima of the function p(u, v\x) by a 
perturbation argument, an extension of the ideas introduced in 
[|4]. The following facts were established in J4[: for a fixed 
broadcast channel q(y,z\x) to compute 

max I(U;Y)+I(V;Z)-I(U;V) 

p(u,v\x) 

if suffices to consider 

1) |W|,|V| < \X\, and 

2) p(x\u,v) £ {0,1}, i.e. X is a function of (U,V), say 
X = f(U,V). 

When X is binary, there are 16 possible functions from U, V 
to X. The proof [8] essentially boiled down to showing 
that the local maxima may only exist for the following two 
cases: U = X,V = 0; V = X,U = 0, leading to the 
terms I(X;Y), I(X; Z) respectively. Indeed, in the proof, 
there were only two non-trivial cases to eliminate: these were 
(assume w.l.o.g. all alphabets ofU,V,X are {0,1}): 

X = U © V (XOR case), X = U A V (AND case). 

Hence we adopt the approach of eliminating classes of func- 
tions where the local maxima may exist and we present the 
generalizations of the AND case and the XOR cases in the 
next two sections. 

In the following sections we assume that p(u, v\x) achieves 
T(X) and X = f(U, V). Further we assume that q(y, z\x) > 



Vx, y, z, i.e. we are in a dense subset of channels with non- 
zero transition probabilities. In this case we can further assume 
that p(u, v) > Vu, v, lf!3l . 

A. Generalization of the AND case 

In this section we deal with an extension of the AND case 
from the proof of the binary inequality |8]. It says that one 
cannot have one column and one row mapped to the same 
input symbol. 

Theorem 2. For any (U, V, X) such that X — f(U,V) and 
p(uv\x) achieves T(X) one cannot find Xq, uq and vq such 
that /(uq, v) = f(u, vq) — xq for all u £ U and v £ V. 

Proof: Assume otherwise that f(uo,v) = f(u, Vq) — xq 
for all u £ U and v £ V. Consider the multiplicative 
perturbation q u ,v,x = Pu,v,x(l + eL u>v ) for some e in some 
interval around zero. For this to be a valid perturbation, it 
has to preserve the marginal distribution of X. Therefore we 
require that 



E 

u.v 



Vx, 



(3) 



We can view the expression I(U; Y) + Z) — I(U; V) 
evaluated at q u , v ,x as a function of e. Non-positivity of the 
second derivative at a local maximum implies 

E(E(L\U, Y) 2 ) + E(E(L|V, Z) 2 ) - E(E(L|C7, V) 2 ) < 0. 

where random variable L is defined to take the value L u . v 
under the event that (U, V) = (u,v). Routine calculations 
show that this condition can be rewritten as follows 



E 



/^'^^/^ r ^f(u 1 ,v)J(u 2 ,v),vIu 1 ,vIu 2 ,v > 0, 



V U\ U2 



EyPy^Py^^-, and 



where Iu,v — Puv-^uvi T Xl X2 ^ u , ^ is r tj 
T Xl ,x 2 ,v is defined similarly. Equation Q can be rewritten 
as 



^ ^ x U . r 

u,v:x—f(u,v) 



Vx. 



(5) 



Now, let us define I UiV as follows: (a) I u<v = when u ^ uq 
and v ^ v , (b) I UQ . V = p Uo . v p VQ when v ^ v , (c) I u . Vo = 
-Pu.vgPuo when u ^ u , and (d) I Uo , VQ = p Uo v (Pv ~ Pu )- 
Note that I UOyV > for all v ^ vo, and I u , Vo < for all 
u ^ uq since p UiV > 0. 

It is easy to verify equation |5]l for this choice. The second 
derivative constraint reduces (after some manipulation) to 

1 



V —I 2 > V T I 2 

/ j m 1 u,v — / j ± x Q ,x Q ,u 1 -xi,v a 



u : v: u—uq 



Pu 



(6) 



Now, using Lemma [2] (a very similar result was used in satisfy 
El) one can see that T Xo>XOiV > T Xo . Xo . V() > 



T, 



> 



x ,x ,u p „„ p„ 



and T r 



> Hence, observe that 



Pu 



> E 



Puqvq r2 i 

U,Vq * 



+ E 



PuVqPuq 



Pu 



Vo /!.„ + 



Pu vPv °' Pv Q ^ ' 



(7) 



One can verify that for our given choice of I U:V the right 
hand side of the equation |7]) is equal to the left hand side of 
equation i.e. 



E 



;,v: u—uq or v—vq 

i 



u.v 

PuV Vvq 



(y y At, Up) 



E 



E 



PuqVq j2 
Uq 

PuqvPvo 



Pv-qVq t-2 

U,' 

PuV Pu 



This implies that both equations |7]) and |6]) have to hold 
with equality for our choice of I u >v . Therefore all the inequal- 
ities that we took from Lemma [2] have to hold with equality. 
But this can happen only if U is independent of Y, and V 
is independent of Z, i.e. I(U ; Y) = I(V; Z) = 0. This is a 
contradiction and completes the proof. ■ 

Lemma 1. If there are u\ ^ U2 such that f(u\, v) = f(u2, v) 
for all v € V, one can find another optimizer p(u' , v\x), where 
I(U; Y)+I(V; Z)-I(U; V) = I(U'; Y)+I(V; Z)-I(U'; V) 
and furthermore \IA'\ < \U\. A similar condition holds if one 
can find V\ 7^ V2 such that f(u, Vi) = f(u, v 2 ) for all u € U. 

Remark 5. This lemma shows that to compute T(X) one only 
needs to consider functions f(U, V) where each row (fixed U) 
has a distinct mapping; similarly for columns. 

Proof: Assume that U = {u%, 112, Uk}- Define U' as 
a random variable taking values in {2,3, ...,&} as follows: 
U' = i if U = m for i > 3, and U' = 2 if U = Ui or 
U = u 2 . Note that H{X\U'V) = since f{u x ,v) = f(u 2 , v) 
for all v S V. It suffices to prove that 

I(U; Y)+I(V; Z)-I(U; V) < I(U'; Y)+I{V; Z)-I(U'; V). 

This is equivalent to showing that I(U; V\U') > I(U; Y\U'). 
Since H(X\U'V) = 0, we have 

I(U; V\U') = I(U; VX\U') = I(U; VXY\U') > I(U; Y\U'). 
This completes the proof. ■ 

Lemma 2. Take arbitrary Ui,v,2,v,x such that f(ui,v) 
:r and f{u2,v) — x. Then any maximizing distribution must 



Py\x ^ Pu\v 

Pu 2 y Pu2vPu! 



Equality implies that p y \ x = p y \ U2 = P y \ Ul f or a ll V- 

Proof: We start with the first derivative condition to write 

1 Pu\v ^ \ ^ n Pu\y , \ ^ , Pvz 

log — < 2^ Py\x log — + 2^ Pz\x log — 

V FU2V Pvz 

y z 



E p vi^ lo g 

y 

E^I* lo S 

y 

E^l^ lo S 



Puiy 

PU2V 
PuiPy\ui 

Pu 2 y 
Pu\Py\x 



E p ^ lo s 



Py\x 



,, Pu 2 y y Py\ui 



E^l^ 10 ' 

y 

E^I* lo S 

y 

lo §E^l 



Pu 1 Py I x 

p 



D(Py\ x \\Py\ Ul ) 



'"■2 V 



PuiPy\x 
Pu 2 y 

Pu\Py\x 
Pu 2 y 



B. An alternate proof for the XOR case 

In this section we provide an alternative proof for the 
binary XOR case, and its generalization to the non-binary case 
(another extension of the XOR case has been provided in lfl3l ). 
Let us begin with the binary XOR case. Let U, V be binary 
random variables, and X = U ffi V. We would like to show 
that under this setting, we have 

I(U; Y) + I(V; Z) < max(7(X; Y),I(X; Z)). 

Definition 1. Given p(u,x), let c p ( u :E ) denote the minimum 
value of c such that I(U; Y) < c-I(X; Y) holds for all p(y\x) 
for all possible alphabets y. Alternatively, c p ( M x ) is the mini- 
mum value of c such that the function q(x) i-> H(U) — cH(X) 
when p(u\x) is fixed, matches its convex envelope at p(x). 

By the data-processing inequality we know that < 
Cp(u,x) < L and the minimum is well defined. 
Remark 6. Note that here we are adopting a dual notion. We 
fix the auxiliary channel p(u\x) and then ask for a minimizing 
c over all the forward channels. 

If c p („ x ) + c p{v x) < 1 then note: I(U; Y) + I(V; Z) < 
c pM I(X; Y) + c p{v , x) I(X; Z) < max(/(X; Y), I(X: Z)). 

Theorem 3. For any binary U, V, X and p(u, v, x) such that 
X = U ' ®V the following inequality holds: c p ( u ^ +c p ( vx j < 
1. 

Proof: Let = p(U = i, V = j) for i, j € {0, 1}. Let 
l and B := p °! = P Q1 1V Then 



._ Poo _ Poo 



P00+P11 p(x=o) 



we claim that c p r UtX ) < \ a ~ P\ an d Cp(v,x) < |a + — 1|. 
This will complete the proof since a, (3 e [0, 1] implies 

\a-/3\ + \a + 0-l\ < 1. 

To show that Cp( U)X ) < |a — 0|, it suffices to show that q(x) i-> 
H(U) — \a — f3\H(X) is convex at all g(x). The proof for 
c p (u.a;) < |a + — 1| is similar. Note that H(U) — \a — 
P\H(X) = h(aq(0) + 0g(l)) - |a - 0|%(O)) where fr(-) 
is the binary entropy function. Thus, we need to look at the 
function x M> h(ax + 0(1 — x)) — \a — f3\h(x) for x £ [0, 1]. 
The second derivative is 

{g-PY \<*-P\ 

(ax + 0(1 - x))(l - (ax + 0(1 - x))) x(l - x) ' 

We need to verify that the above expression is non-negative, 
i.e. 

(ax + 0(1 - x))(l - (ax + 0(1 - x))) > |a - 0|x(l - x). 

This is true because 

(ax + 0(1 - x))(l - (ax + 0(1 - x))) 
= (ax + 0(1 - x))(x(l - a) + (1 - x)(l - 0)) 

> ax(l - x)(l -13) + (3(1 - x)x(l - a) 
= x(l-x)[a(l-0)+0(l-a)] 

> x{l -x)|a(l-0) -0(1- a) | 
= x(l-x)|a-0|. 

■ 

Remark 7. Note that the definition of c p ( UiX ) requires the 
constraint 7(J7;y) < c ■ 7(X;Y) to hold for all channels 
p(y\x). If the subchannel p(y\x) is known to be an erasure 
channel (i.e. Y is equal to X with some probability and erased 
otherwise), we can get even smaller values for c (here -ff^y)- 

In the Appendix C, we give a geometric interpretation to 
above, which yields insights for higher cardinality alphabets. 

IV. CONCLUSION 

We propose a pathway for verifying the optimality of 
Marton's inner bound by trying to determine properties of 
the extremal distributions. We establish some necessary con- 
ditions, extending the work in the binary input case. We also 
add to the set of sufficient conditions. We present a few 
conjectures whose verifications have immediate consequences 
for the optimality of Marton's region. 
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Appendix 

A. Proof of an inequality for BSSC 

We consider the binary skew-symmetric broadcast channel 
with p = \ shown in Figure [I] For this channel we prove that 
for all a > 1 and for all (U, V) -> X -> (Y, Z) we have 

aI(U;Y)+I(V;Z)-I(U;V) < max{a/(X; Y),I(X; Z)}. 

The proof of this claim again uses the perturbation method. 
Using the same arguments as in (8l it is easy to deduce that 
the AND case is the only non-trivial case. (The impossibility 
of an XOR mapping being a local maximum carries over for 
all binary input broadcast channels in this setting.) 

Claim 2. Any p(u, v) > such that X — U A V cannot 
maximize (for a > I) 

aI(U;Y)+I(V;Z)-I(U;V), 

for the BSSC channel for a fixed p(x). 

Proof: Consider a perturbation of the form p e (u, v) = 
p(uv)(l + eL(u, v)), where J2 U v L( u , v )p( u , v \ x ) — f° r a H 



x. The first derivative conditions for a local maximum imply Let I ki := p uv klL(k,l) for k, I € {0,1}. Observe that 

that loo = —loi + follows from the condition E(L\X) = 0. 

/ iq\ j q Pvz(lz)Puv(Qty _ q Computing the terms 

4- q ^ U ° g Pvz (0z)p uv (01) ~ ' rt rt 

Ky 1 >> Puv (00)+ Puv (01) p u „(10)+p u „(ll)' 



? q(y|0) l0g ^(^(DC-VdQ) = °" E ^ - + + - 



(00) Pw) (01) p u „(10)' 

ForBSSCwehave q( ,|0) = l>^{0,l }a nd q (,|l)= E(E W F) 2 ) = f + ^ + 

1,2/ = 1 and vice-versa for Z. v ' v ' 

Substituting this into first derivative conditions we obtain 1° 

2{p uv {lQ) + 2p uv {ll)Y 

ft»(oo)ft»(oi) E(E(L|y, z) ) - puvm+puv{10) + 2puvm+Puvi ny 

P u V (10)>u(0^-^00) Puy(ll)>M ia -%u^ 1 be the negative of the Hessian. Using ^oo = ^ 

— 7 — T7 — ii ttttt — /7^7\ 7TT? — n ttttt = 1- 2i i Jio + ^in, the quadratic form defined by G can be written 

P« I ,(00)^(l)(--iW(10),^(01)- ft ,(l)(--i) ftro (10) as ° Goo 7 2 i + (Got +Gf 10 )Ioi/io + Gni?o. where 

The first of the above conditions is equivalent to 11 1 



(01) p„„(00) p™(00)+p u „(10) 

Pvz{W)Puv{00) _ (Pm,(01) + 2Puv{ll))Puv{00) 2 

P„*(00)p„„(01) (p„„(00) + p„„(10))p u „(01) ' - 2p„„(01) + p m ,(ll)' 

or that r< — n ^ 

-, Gr l - WO - TTjTyT 7 

^Puv(H)Puv(00) = Puv (10)p uv (01). PUV[W) _ _ 

1 Q _ a 1 i a 1 

The second of the above conditions can be written as Puv(00) +P«v(01) p uv (10) + p uv (ll) 

1 1 a 

1 = Pu V (wrPu(Q) 2 ^Pu V m 2 Pu V (nr + p^) + puoo)~ Puv m+ Puv (oi) 

P„ y (00)«p u (l) 2 («- 1 W(10) 2 p ua (01)« _ __a a 

Puv(l0) a ( P uv(00) + PuviOl^-Vpuvm 2 2 Puv (W) 2(p o „(10) + 2p uv (ll)) 

1 1 1 



+ 



Moo) + P uv(oi)) a { P uv(io)+p uv (ii)y( a - 1 ) Puv (ioy 

(Puv(lO) +2 Puv (ll)) a Puv(W) ' p„„(00) p uv (00) + p uv (01) 

X (p u „(00)+p uv (01))° + " 1 



/i _l_ o Puv (11) \a £* ut > 

(10)+ Pm ,(ll) 2^(10) 



( 1 + ^) 2 ( 1 + ^) 2(Q - 1) ' 2(p u „(10) + 2^(11))- ^ 

Using p uv (00)p ut) (ll) = 2p m) (10)p m ,(01) we can write the 



Let x = p*"^ooj • Then from the first condition we have term G 00 as 
Puv( j]ll = 2x. The second condition becomes 11 1 

Pu«(10) Gqo = 77TTT + 



Puv 

(01) p uv (00) P uv{00)+p uv {10) 
1 = (l + 4x)« 2 

(l + 2 ;)2(l + 2 a; )2(-i)- - 2 p u „(01)+p u „(ll) 

The second derivative conditions imply the following. Note = ^ — - -\ — — — — - 

that the expression we are dealing with is essentially Puv{0l) Pm,(00) p uv (00) + p uv (10) 

Puvim 

(a - 1)H(U) + H(UV) - aH(UY) - H(VZ) ^(01)^(00) + p u „(10)) 

• TTW A TJ ( 7\ fi A (Puv{00)+Puv{0l))Puv(10) 

, since H(Y) and H(Z) are fixed. = -, j— r- y-rrz — rr j— r. 

tt , , tf „ ... ,,. ,. ,. (Puv 00 + Puv 10 ))p uv 01 p ut , 00 

Hence we would like to show that for all valid multiplicative 

perturbations, i.e. perturbations with E(L\X) = 0, as above, Hence for G t0 be P ositive semi-definite we require 
we have " - 1 a a 



Puv(l0) + p uv (ll) 2p uv (10) 2( Pu „(10)+2p u „(ll)) 
(a - 1) E(E(L|[/) 2 ) + E(L ) - aE(E(L\U, Y) ) p U v (00) 

-E(E(L|U,Z) 2 ) >0. Puv{\0){p uv {00)+p uv {0\)Y 



> 




Fig. 2. The plot of g(x) for x £ [0, §] 



Multiplying by p uv (10) on both sides and noting that x 
p"" (oo) = \ p""(io) we can rewrite this necessary condition i 

a — 1 a a „ I 

1 + 2x 2 

This reduces to 

1 + Ax 

a < 



2(1 + Ax) ~ \l + x) 



(1 + x)Ax 

Thus for a local maximum to occur we need that there is 
an x £ (0, oo) satisfying the following two conditions: 

1 + 4x 

1 < a < 
1 



(l + x)4x' 
(1 + Ax) a 



(l + x) 2 (l + 2x) 2 ( a ~ 1 )' 
The second condition implies that 

(l + 4a) (1 + x) 2 



(l + 2x) 2 



(l + 2a;) 2 



For 1 < mj^^ (from first condition) to hold for some 
x £ (0, oo) we need x £ (0, §]. 

Plugging the value of a from the second condition, we also 
require that 



logii±^t> 1 



Ax , (l+4x) 
log 



(1 + 2x) 2 ~ (1 + x)4a; ° (1 + 2x) 2 

(Note the negativity of log (l+2x) 2 wnen x G (0> §]•) 
This is equivalent to 



(1 + xf 

(l + 2x) 2 ~ V(1 + 2.t) 



> 



(1 + Ax) \ (l+x)i 



Define 



(l + x) 2 
(l + 2a;) 2 V(l + 2x) 



(1 + Ax) \ (l+wf* 



Plotting g(x) in the interval [0, |] we see that g(0) = and 
it is strictly negative and decreasing in (0, |). Hence there 
is no x simultaneously satisfying both the first and second 
derivative conditions for BSSC when a > 1. This establishes 
the AND case for BSSC. 



B. A counterexample 

We will produce a counterexample to the following state- 
ment: for |^| = 2 the following inequality 

aI(U; Y)+I{V;Z)-I{U;V) < max{a/(X; Y).I(X; Z)}, 

holds for any a > 1 and any Markov chain (U, V) — > X — > 
(Y,Z). 

Consider the following setting: The channels are: 



1 
0.1 0.9 



p(Y\X)=^ ° YJ- V{Z\X) 

The parameters are 

p(X) = [0.8, 0.2], a = ffi^ = 3.429517. 

The choice of a is actually the corner point for RHS. The 
result is 

LHS = 0.593020 > 0.586278 = RHS, 

where LHS is obtained by the p.m.f. and mapping X = 

f(U,V)as 



P (U,V) 



0.05930 0.00005 
0.14065 0.80000 



m v) = 



i i 

1 



This is AND case, exactly the case we cannot prove in general. 
We also plot the figure for LHS and RHS w.r.t. a: 
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C. A geometric interpretation for Cpt u<x ) 

We know that c p t u ^\ is the minimum value of c such that 
the function q(x) H> H(U) — cH(X) when p(u\x) is fixed, 
matches its convex envelope at p(x). The maxp^) c p r UyX ) 
would be the minimum value of c such that q(x) i-> H(U) — 
cH(X) is completely convex in p(x) for a fixed p(u\x). Let us 
fix some p(u\x) and let d , * be the minimum value of c such 
that the function n- H(U) — cH(X) is convex at p(x). 
We will then have that maXp (x ) c v ( u , x ) = maXp( x ) c' p{u x y 



The term d p ^ u x ^ has a nice geometric interpretation. We 
begin by using the perturbation method and perturb p(u, x) 
along a direction L(X) such that E[L] = (therefore we are 
fixing p(u\x)). The second derivative of H(U) — cH(X) along 
this direction is 

-E(E(L\U) 2 ) + cE(E(L\X) 2 ) = -E{E{L\U) 2 ) + cE(L 2 ). 

If we want this to be greater than or equal to zero, it implies 
that c > E{E £QU) 2 ) f or a n l(x) such that E[L] = 0. 

Given a fixed sample space, f2, the set of all square- 
integrable real-valued random variables forms a vector space 
V with normal addition and scalar multiplication of random 
variables. We define the inner product between two random 
variables X and Y to be E(XY). 

The set of all real-valued functions of X is itself a linear 
subspace of V. Let us denote this set by Vx- We can similarly 
define the set of all real-valued random variables that are 
functions of U and denote it by Vu- 

Now, let us use 1 to denote the random variable that takes 
value 1 with probability 1. The set of square-integrable real- 
valued random variables that are perpendicular to 1 are the 
ones with zero expected value. Let us denote the set of these 
random variables, itself a linear subspace, by Vij_. 

Note that 1 e Vx and 1 € Vu- Let us define the following 
two subspaces: 

v^v x nv 11; 
V u = V u r\V 1 ±. 

Now, the perturbation method says that we should take some 
L(X) in V' x . Its projection onto Vu is equal to E(L\U), its 
squared length being E(E(L\U) 2 ). Note that the projection 
onto Vu is the same as the projection onto V'u because all 
the action is taking place in Vij_. The expression E ^^0^ ^ 
is the cosine squared of the angle formed by vector L and 
its projection onto V' v . The term c should dominate all such 
cosine-squared values when L freely changes over V' x . Thus, 
d, x n has to be the cosine-squared of the angle between the 
two subspaces V[j and V' x . This is because we are taking an 
arbitrary vector L in V' x , then finding the vector in V'u that 
has the smallest angle with L, i.e. its projection of L onto Vjj, 
and then computing their cosine-squared expression. 

Note that if the Gacs-Korner common information between 
U and X is non-trivial, then the angle between the two 
subspaces V'u and V' x is zero (because the intersection of 
V' x and V'u will be non-trivial). Otherwise, the angle between 
the two subspaces is strictly positive. It is worth noting that 
the angle between the two subspaces V'u and V' x has a 
symmetric definition. Therefore the minimum value of c such 
that the function q(x) h-» H{U) — cH{X) is convex at the 
given p(x), is the same as the minimum value of c such that 
q(u) i ^ H(X) — cH(U) is convex at the given p(u). 

To compute the cosine of the angle between the two 
subspaces it suffices to take two arbitrary vectors in these two 
subspaces and maximize the cosine of their angle. We can 



express the cosine of the angle using Pearson's correlation 
coefficient between two variables: 



c ofu t) ~ ( max 



Cov(L(X),T(U)) 



y/Var(L(X)) x ^Var(T{U))' 

Note that here the maximization is over arbitrary functions 
L and T, and the requirement that E[L] = E[T] = 
is relaxed. Another formula for d, x n is the maximum 
of (E[L(X)T(U)}) 2 over all L{X) and T(U) satisfying 
E[L(X)} = 0, E[L{X) 2 } = 1, E[T(U)} = and E[T(U) 2 } = 
1. This is a simple optimization problem and can be dealt with 
using the Lagrange multipliers technique. It gives rise to other 
analytical formulas for d , v 



